37 research outputs found
Pose-Guided Multi-Granularity Attention Network for Text-Based Person Search
Text-based person search aims to retrieve the corresponding person images in
an image database by virtue of a describing sentence about the person, which
poses great potential for various applications such as video surveillance.
Extracting visual contents corresponding to the human description is the key to
this cross-modal matching problem. Moreover, correlated images and descriptions
involve different granularities of semantic relevance, which is usually ignored
in previous methods. To exploit the multilevel corresponding visual contents,
we propose a pose-guided multi-granularity attention network (PMA). Firstly, we
propose a coarse alignment network (CA) to select the related image regions to
the global description by a similarity-based attention. To further capture the
phrase-related visual body part, a fine-grained alignment network (FA) is
proposed, which employs pose information to learn latent semantic alignment
between visual body part and textual noun phrase. To verify the effectiveness
of our model, we perform extensive experiments on the CUHK Person Description
Dataset (CUHK-PEDES) which is currently the only available dataset for
text-based person search. Experimental results show that our approach
outperforms the state-of-the-art methods by 15 \% in terms of the top-1 metric.Comment: published in AAAI2020(oral
FreeU: Free Lunch in Diffusion U-Net
In this paper, we uncover the untapped potential of diffusion U-Net, which
serves as a "free lunch" that substantially improves the generation quality on
the fly. We initially investigate the key contributions of the U-Net
architecture to the denoising process and identify that its main backbone
primarily contributes to denoising, whereas its skip connections mainly
introduce high-frequency features into the decoder module, causing the network
to overlook the backbone semantics. Capitalizing on this discovery, we propose
a simple yet effective method-termed "FreeU" - that enhances generation quality
without additional training or finetuning. Our key insight is to strategically
re-weight the contributions sourced from the U-Net's skip connections and
backbone feature maps, to leverage the strengths of both components of the
U-Net architecture. Promising results on image and video generation tasks
demonstrate that our FreeU can be readily integrated to existing diffusion
models, e.g., Stable Diffusion, DreamBooth, ModelScope, Rerender and ReVersion,
to improve the generation quality with only a few lines of code. All you need
is to adjust two scaling factors during inference. Project page:
https://chenyangsi.top/FreeU/.Comment: Project page: https://chenyangsi.top/FreeU
MetaFormer Is Actually What You Need for Vision
Transformers have shown great potential in computer vision tasks. A common
belief is their attention-based token mixer module contributes most to their
competence. However, recent works show the attention-based module in
Transformers can be replaced by spatial MLPs and the resulted models still
perform quite well. Based on this observation, we hypothesize that the general
architecture of the Transformers, instead of the specific token mixer module,
is more essential to the model's performance. To verify this, we deliberately
replace the attention module in Transformers with an embarrassingly simple
spatial pooling operator to conduct only basic token mixing. Surprisingly, we
observe that the derived model, termed as PoolFormer, achieves competitive
performance on multiple computer vision tasks. For example, on ImageNet-1K,
PoolFormer achieves 82.1% top-1 accuracy, surpassing well-tuned Vision
Transformer/MLP-like baselines DeiT-B/ResMLP-B24 by 0.3%/1.1% accuracy with
35%/52% fewer parameters and 50%/62% fewer MACs. The effectiveness of
PoolFormer verifies our hypothesis and urges us to initiate the concept of
"MetaFormer", a general architecture abstracted from Transformers without
specifying the token mixer. Based on the extensive experiments, we argue that
MetaFormer is the key player in achieving superior results for recent
Transformer and MLP-like models on vision tasks. This work calls for more
future research dedicated to improving MetaFormer instead of focusing on the
token mixer modules. Additionally, our proposed PoolFormer could serve as a
starting baseline for future MetaFormer architecture design. Code is available
at https://github.com/sail-sg/poolformer.Comment: CVPR 2022 (Oral). Code: https://github.com/sail-sg/poolforme
Tailoring surface hydrophilicity of porous electrospun nanofibers to enhance capillary and push-pull effects for moisture wicking
In this article, liquid moisture transport behaviors of dual-layer electrospun nanofibrous mats are reported for the first time. The dual-layer mats consist of a thick layer of hydrophilic polyacrylonitrile (PAN) nanofibers with a thin layer of hydrophobic polystyrene (PS) nanofibers with and without interpenetrating nanopores, respectively. The mats are coated with polydopamine (PDOPA) to different extents to tailor the water wettability of the PS layer. It is found that with a large quantity of nanochannels, the porous PS nanofibers exhibit a stronger capillary effect than the solid PS nanofibers. The capillary motion in the porous PS nanofibers can be further enhanced by slight surface modification with PDOPA while retaining the large hydrophobicity difference between the two layers, inducing a strong push–pull effect to transport water from the PS to the PAN layer
Gene Delivery to Nonhuman Primate Preimplantation Embryos Using Recombinant Adeno-Associated Virus
Delivery of genome editing tools to mammalian zygotes has revolutionized animal modeling. However, the mechanical delivery method to introduce genes and proteins to zygotes remains a challenge for some animal species that are important in biomedical research. Here, an approach to achieve gene delivery and genome editing in nonhuman primate embryos is presented by infecting zygotes with recombinant adeno-associated viruses (rAAVs). Together with previous reports from the authors of this paper and others, this approach is potentially applicable to a broad range of mammals. In addition to genome editing and animal modeling, this rAAV-based method can facilitate gene function studies in early-stage embryos